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Abstract — We  design  distributed  spectrum  sensing  and  access 
strategies  for  opportunistic  spectrum  access  (OSA)  under  an 
energy  constraint  on  secondary  users.  Both  the  continuous  and  the 
bursty  traffic  models  are  considered  for  different  applications  of 
the  secondary  network.  In  each  slot,  a  secondary  user  sequentially 
decides  whether  to  sense,  where  in  the  spectrum  to  sense,  and 
whether  to  access.  By  casting  this  sequential  decision-making 
problem  in  the  framework  of  partially  observable  Markov  de¬ 
cision  processes,  we  obtain  stationary  optimal  spectrum  sensing 
and  access  policies  that  maximize  the  throughput  of  the  secondary 
user  during  its  battery  lifetime.  We  also  establish  threshold  struc¬ 
tures  of  the  optimal  policies  and  study  the  fundamental  tradeoffs 
involved  in  the  energy-constrained  OSA  design.  Numerical  results 
are  provided  to  investigate  the  impact  of  the  secondary  user’s 
residual  energy  on  the  optimal  spectrum  sensing  and  access 
decisions. 

Index  Terms — Cognitive  radio,  opportunistic  spectrum  access, 
partially  observable  Markov  decision  process  (POMDP),  spectrum 
sensing. 


I.  Introduction 

Opportunistic  spectrum  access  (osa),  also  re¬ 
ferred  to  as  spectrum  overlay  or  spectrum  pooling  [1],  is 
one  of  the  approaches  envisioned  for  dynamic  spectrum  man¬ 
agement.  It  has  received  increasing  attention  due  to  its  poten¬ 
tial  for  improving  spectrum  efficiency  and  its  compatibility  with 
the  current  spectrum  management  policy  and  legacy  wireless 
systems.  The  basic  idea  of  OSA  is  to  allow  secondary  users  to 
search  for  and  exploit  local  and  instantaneous  spectrum  oppor¬ 
tunities  with  limited  interference  to  primary  users.  The  physical 
platform  of  OSA  and  other  dynamic  spectrum  access  strategies 
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is  cognitive  radio,  which  is  capable  of  agile  sensing  and  commu¬ 
nication  through  adaptive  learning  [2].  As  such,  cognitive  radio 
is  often  used  as  a  synonym  for  different  dynamic  spectrum  ac¬ 
cess  strategies  (see  [3]  for  a  survey  of  different  approaches  en¬ 
visioned  for  dynamic  spectrum  access). 

In  this  paper,  we  focus  on  the  design  of  distributed  medium 
access  control  (MAC)  protocols  for  OSA  under  an  energy  con¬ 
straint  on  secondary  users.  We  consider  secondary  users,  each 
with  a  finite  amount  of  initial  energy,  exploiting  temporal  spec¬ 
trum  opportunities  in  a  slotted  primary  system.  In  each  slot,  a 
secondary  user  either  turns  off  its  transceiver  to  save  energy  or 
chooses  a  channel  in  the  spectrum  to  sense  and  possibly  access, 
resulting  in  different  levels  of  reduction  in  its  battery  energy. 
A  MAC  protocol  governing  such  a  sequential  decision-making 
process  thus  consists  of  two  components:  i)  a  sensing  strategy 
that  specifies  whether  to  sense  and  where  in  the  spectrum  to 
sense  and  ii)  an  access  strategy  that  determines  whether  to  ac¬ 
cess  based  on  the  sensing  outcomes  regarding  the  occupancy 
state  (idle  or  occupied  by  primary  users)  and  the  fading  con¬ 
dition  of  the  channel.  The  design  objective  is  to  maximize  the 
throughput  of  a  secondary  user  during  its  battery  lifetime.  We 
propose  optimal  MAC  protocols  for  both  the  continuous  and 
the  bursty  traffic  models.  Eor  brevity,  we  adopt  the  continuous 
traffic  model,  where  the  secondary  user  always  has  packets  to 
transmit,  unless  otherwise  specified. 

A.  Energy -Constrained  OSA  Design 

While  optimal  distributed  MAC  protocols  for  OSA  have  been 
proposed  in  [4],  [23],  [5],  [24],  the  impact  of  the  energy  con¬ 
straint  on  optimal  sensing  and  access  protocols  has  not  been 
studied.  The  incorporation  of  the  energy  constraint  significantly 
complicates  the  problem.  Eirst  consider  the  sensing  strategy. 
Without  the  energy  constraint,  the  secondary  user  should  al¬ 
ways  sense,  and  its  channel  selection  should  exploit  the  spec¬ 
trum  occupancy  statistics  to  achieve  the  best  tradeoff  between 
gaining  immediate  access  and  gaining  statistical  information 
about  the  spectrum  occupancy  [4],  [23],  [5],  [24].  With  the  en¬ 
ergy  constraint,  however,  the  secondary  user,  even  with  packets 
to  transmit,  may  choose  to  sleep  to  conserve  energy.  Moreover, 
channel  selection  should  also  exploit  channel  fading  statistics 
since  a  channel  in  deep  fading  requires  more  energy  for  trans¬ 
mission.  The  design  tradeoff  involved  in  sensing  decisions  thus 
lies  among  three  often  conflicting  objectives:  gaining  imme¬ 
diate  access,  gaining  spectrum  occupancy  information,  and  con¬ 
serving  energy. 

It  has  been  shown  in  [5]  and  [24]  that  without  the  energy 
constraint,  the  optimal  access  strategy  is  to  access  if  and  only 
if  the  channel  is  sensed  as  idle,  provided  that  the  operating 
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characteristics  (false  alarm  rate  versus  miss  detection  rate) 
of  the  spectrum  sensor  is  chosen  optimally  according  to  the 
interference  constraint  defined  by  the  probability  of  collision. 
With  the  energy  constraint,  channel  fading  statistics  play  an 
important  role  in  access  decision-making.  For  example,  when 
the  sensed  channel  is  idle  but  has  poor  fading  condition,  should 
the  secondary  user  with  packets  to  send  access  this  channel  to 
gain  immediate  reward  or  wait  for  better  channel  realizations 
to  save  transmission  energy  but  waste  the  energy  already  used 
in  sensing?  Clearly,  such  a  decision  depends  on  the  secondary 
user’s  residual  energy  level  and  its  energy  consumption  charac¬ 
teristics,  as  well  as  the  channel  fading  statistics. 

Bursty  Traffic:  Bursty  traffic  of  the  secondary  user  fur¬ 
ther  complicates  the  design.  In  this  case,  the  design  tradeoffs 
vary  with  the  secondary  user’s  buffer  state.  Specifically,  when 
the  buffer  is  empty,  the  secondary  user  does  not  need  to  gain 
immediate  access  in  the  current  slot.  Hence,  sensing  decisions 
are  made  for  the  sole  purpose  of  gaining  statistical  information 
about  spectrum  occupancy.  The  question  raised  here  is  whether 
the  secondary  user  should  continue  tracking  the  dynamics  of  the 
spectrum  for  future  use  or  turn  off  its  transceiver  to  save  energy. 
Intuitively,  the  sensing  strategy  employed  by  the  secondary  user 
when  its  buffer  is  empty  should  be  fundamentally  different  from 
the  one  used  when  it  has  packets  to  transmit. 

B.  Main  Results 

Within  the  framework  of  partially  observable  Markov  deci¬ 
sion  process  (POMDP),  we  tackle  the  optimal  MAC  design  for 
energy-constrained  OS  A.  By  modeling  the  primary  users’  traffic 
as  a  Markov  chain,  we  formulate  the  problem  of  dynamically 
choosing  whether  to  sense,  where  in  the  spectrum  to  sense,  and 
whether  to  access  for  maximum  throughput  as  a  POMDP  with  a 
finite  but  random  time  horizon.  This  formulation  allows  us  to  in¬ 
tegrate  the  dynamics  of  spectrum  occupancy  and  channel  fading 
into  the  MAC  design.  The  optimal  MAC  design  is  given  by  the 
stationary  optimal  policy  of  this  POMDP,  which  can  be  solved 
using  existing  POMDP  algorithms. 

To  gain  insights  into  the  energy-constrained  OSA  problem, 
we  search  for  structures  of  the  optimal  sensing  and  access 
policies.  We  show  that  in  the  single-channel  case,  the  optimal 
sensing  decision  (whether  to  sleep  or  sense)  has  a  threshold 
structure:  the  secondary  user  should  sense  the  channel  if  and 
only  if  the  conditional  probability  that  the  channel  is  idle  in  the 
current  slot  (conditioned  on  the  entire  sensing  and  observation 
history)  is  above  a  certain  threshold  (referred  to  as  the  sensing 
threshold).  We  also  show  that  the  optimal  access  strategy  is  a 
threshold  policy  in  terms  of  the  channel  fading  condition.  That 
is,  the  secondary  user  should  access  the  channel  if  and  only 
if  the  sensing  outcome  indicates  that  the  channel  is  idle  and 
its  fading  condition  is  better  than  a  certain  threshold  (referred 
to  as  the  access  threshold).  These  structural  results  not  only 
reveal  the  fundamental  design  tradeoffs  but  also  reduce  the 
computational  complexity  in  searching  for  the  optimal  policies. 

These  structural  results  are  complemented  with  numerical 
examples.  We  study  different  factors  that  affect  the  optimal 
sensing  and  access  decisions.  We  find  that  the  impact  of  the 
secondary  user’s  residual  energy  on  the  optimal  decisions 


diminishes  as  the  residual  energy  increases.  This  observation 
indicates  that  energy  conservation  only  plays  a  critical  role  in 
sensing  and  access  decisions  when  the  battery  of  the  secondary 
user  is  close  to  depletion.  We  also  find  that  when  the  sensing 
energy  consumption  is  large,  the  secondary  user  should  be 
more  conservative  in  sensing,  but  more  aggressive  in  access. 
Specifically,  the  secondary  user  should  increase  the  sensing 
threshold  and  lower  the  access  threshold. 

Bursty  Traffic:  We  also  extend  our  analysis  to  the  case 
where  the  secondary  user  has  bursty  traffic.  As  explained  in 
Section  I-A,  the  optimal  sensing  decisions  in  this  case  should 
incorporate  the  secondary  user’s  buffer  state.  We,  however, 
note  that  due  to  random  packet  arrivals,  the  receiver  does  not 
know  the  secondary  user’s  buffer  state.  This  impedes  optimal 
distributed  design  since  in  the  absence  of  additional  control 
channels,  transceiver  synchronization  requires  the  secondary 
user  and  its  receiver  to  have  the  same  information  for  deci¬ 
sion-making  [4],  [5],  [23],  [24].  We  overcome  this  obstacle  by 
treating  the  buffer  state  as  a  partially  observable  parameter.  The 
secondary  user  and  its  receiver  can  thus  make  sensing  decisions 
based  on  the  conditional  probability  mass  function  (PMF)  of 
the  buffer  state.  We  show  that  the  secondary  user  with  an  empty 
buffer  can  benefit  from  sensing  a  channel  if  the  time-correlation 
of  the  spectrum  occupancy  state  is  sufficiently  large. 

C.  Related  Work 

Cognitive  MAC  design  for  OSA  has  been  addressed  under 
different  network  architectures  (see  [4]-[7],  [23],  [24],  and  ref¬ 
erences  therein).  In  [6],  the  authors  address  the  implementation 
of  a  MAC  protocol  for  OSA  in  a  GSM  primary  network.  A  ded¬ 
icated  control  channel  is  required  for  the  secondary  transmitter 
and  receiver  to  exchange  information  about  channel  selection. 
In  [4]  and  [23]  optimal  distributed  MAC  protocols  are  proposed 
for  OSA  in  slotted  primary  systems.  The  proposed  protocols  en¬ 
sure  synchronous  hopping  of  the  secondary  transmitter  and  re¬ 
ceiver  in  the  spectrum  without  requiring  central  controllers  or 
control  channels.  More  recently,  sensing  errors  have  been  taken 
into  account  in  the  MAC  design  [4],  [5],  [23],  [24].  Signifi¬ 
cantly,  a  separation  principle  is  established  in  [5]  and  [24]  for 
the  optimal  joint  design  of  the  physical  layer  spectrum  sensor 
and  the  MAC  layer  sensing  and  access  strategies.  In  [7],  access 
strategies  for  a  slotted  secondary  user  searching  for  opportuni¬ 
ties  in  an  unslotted  primary  network  are  considered,  where  a 
round-robin  single-channel  sensing  scheme  is  used  and  sensing 
is  considered  to  be  perfect.  The  joint  design  of  the  spectrum 
sensor  and  sensing  and  access  strategies  for  OSA  in  unslotted 
primary  systems  has  been  addressed  in  [8].  To  our  best  knowl¬ 
edge,  energy-constrained  OSA  design  has  not  been  considered 
in  the  literature. 

Statistical  models  for  spectrum  usage  of  primary  systems 
are  important  for  OSA  protocol  design.  Existing  work  along 
this  line  can  be  found  in  [9]-[ll].  Measurements  obtained 
from  spectrum  monitoring  test-beds  demonstrate  the  Makovian 
transition  between  busy  and  idle  channel  states  in  wireless 
LANs  [9],  a  model  similar  to  that  used  in  this  paper.  With  these 
active  experimental  research  activities,  we  can  perhaps  foresee 
a  public  database  of  statistical  models  of  spectrum  usage  in 
different  bands  and  at  different  times  and  locations.  Secondary 


Authorized  licensed  use  limited  to:  Univ  of  Calif  Davis.  Downloaded  on  February  6,  2009  at  21 :04  from  IEEE  Xplore.  Restrictions  apply. 


CHEN  et  al:  DISTRIBUTED  SPECTRUM  SENSING  AND  ACCESS  IN  COGNITIVE  RADIO  NETWORKS 


785 


users  can  then  download  the  required  model  for  the  design  of 
spectrum  sensing  and  access  strategies. 

An  overview  of  challenges  and  recent  developments  in  OSA 
can  be  found  in  [12]. 

D.  Organization  and  Notation 

The  rest  of  this  paper  is  organized  as  follows.  After  describing 
the  primary  and  the  secondary  network  models  in  Section  II,  we 
formulate  the  optimal  MAC  design  for  energy-constrained  OSA 
as  a  POMDP  over  a  random  horizon  in  Section  III.  In  Section  IV, 
we  derive  recursive  formulas  for  solving  this  POMDP  and  es¬ 
tablish  structures  of  the  solution.  We  also  address  the  distributed 
implementation  of  the  obtained  optimal  design.  In  Section  V,  we 
further  establish  the  threshold  structures  of  the  optimal  sensing 
and  access  policies  and  study  different  factors  that  affect  the  op¬ 
timal  decisions.  Finally,  Section  VI  focuses  on  the  energy-con- 
strained  OSA  design  for  secondary  users  with  bursty  traffic,  and 
Section  VII  concludes  the  paper. 

Random  variables  and  their  realizations  are  denoted  by  cap¬ 
ital  and  small  letters,  respectively.  Vectors  are  denoted  by  bold¬ 
faced  letters.  For  two  equal-length  vectors  x  =  [rr i ,  rr2 , . . . , 
and  y  =  [?/i, ?/2,  •  •  • ,  Vn],  we  say  x  >  y  if  Xk  >  Vk  for  all  k. 
Let  l[x]  denote  the  indicator  function:  l[x]  =  1  if  event  X  oc¬ 
curs  and  zero  otherwise. 

II.  Network  Model 

A.  Primary  Network  Model 

Consider  a  spectrum  consisting  of  N  channels,  each  with 
potentially  different  bandwidth  =  1,...,^).  These 

N  channels  are  licensed  to  a  primary  network  employing  a 
synchronous  slotted  communication  protocol.  The  primary 
traffic  is  modeled  as  a  time-homogeneous  discrete  Markov 
process.  Specifically,  let  Sn{t)  C  {O(busy),  l(idle)}  de¬ 
note  the  occupancy  of  channel  n  by  the  primary  network 
in  slot  t.  The  spectrum  occupancy  state  (SOS),  denoted  by 
S(t)  =  [S'! (f ) , . . . ,  *S'Ar(f)],  forms  a  Markov  chain  with  state 
space  S  =  {0, 1}^.  The  transition  probabilities  are  denoted  by 

Ps(s'|s)  =  Pr{S(f)  =  s'|S(f-l)  =  s},  s,s'g<S  (1) 

which  are  determined  by  the  statistics  of  the  primary  traffic  and 
assumed  known  to  secondary  users. 

B.  Secondary  Network  Model 

Consider  an  overlay  ad  hoc  secondary  network  whose  users 
independently  and  selfishly  search  for,  according  to  a  MAC  pro¬ 
tocol,  instantaneous  spectrum  opportunities  in  these  N  chan¬ 
nels.  We  assume  that  each  secondary  user  can  only  sense  and 
access  one  channel  in  a  slot.  At  the  beginning  of  each  slot,  a 
secondary  user  first  determines  its  operation  mode:  sleeping  or 
sensing.  If  the  former,  the  user  turns  off  its  transceiver  until  the 
next  slot.  If  the  latter,  the  user  chooses  one  channel  to  sense  and 
then  decides  whether  to  access  this  channel  based  on  the  sensing 
outcome.  We  assume  that  sensing  errors  are  negligible. 

The  optimal  sensing  and  access  decisions  are  made  based  on 
the  user’s  statistical  knowledge  of  the  SOS  and  its  own  residual 
energy.  Our  goal  is  to  design  the  optimal  sensing  and  access 


strategies  that  maximizes  the  throughput  of  an  individual  sec¬ 
ondary  user  during  its  battery  lifetime. 

Channel  Fading  Model:  We  adopt  a  block  channel  fading 
model.  1  Specifically,  we  assume  that  the  channel  gain  between 
the  secondary  user  and  its  receiver  is  a  random  variable  inde¬ 
pendently  and  identically  distributed  (i.i.d.)  across  slots  but  not 
necessarily  i.i.d.  across  channels. 

Energy  Model:  The  secondary  user  is  powered  by  a  battery 
with  finite  initial  energy  e.  Energy  consumption  in  a  slot  may 
include  the  following:  i)  the  energy  consumed  in  the  sleeping 
mode;  ii)  the  energy  consumed  in  sensing  the  channel  oc¬ 
cupancy  and  estimating  the  channel  fading  condition^;  iii)  the 
energy  Et^{n)  consumed  in  successfully  transmitting  over 
channel  n  in  a  slot.  In  general,  we  have  ^  For 

ease  of  presentation,  we  assume  that  the  sleeping  energy  and 
the  sensing  energy  are  constants,  invariant  to  channel  fading. 

Due  to  hardware  and  power  limitations,  the  secondary  user 
only  has  a  finite  number  L  of  transmission  power  levels.  We 
assume  that  the  user  transmits  at  a  fixed  rate.  Hence,  to  ensure 
successful  transmission,  the  user  has  to  adjust  its  transmission 
power  according  to  the  current  channel  fading  condition.  The 
transmission  energy  consumption  Et^{n)  is  thus  a  random  vari¬ 
able  depending  on  the  current  channel  fading  condition.  In  gen¬ 
eral,  the  better  the  channel,  the  lower  the  transmission  power 
level.  Let  Sk  denote  the  energy  consumed  in  transmitting  at  the 
ki\\  power  level  with  si  <  ...  <  el-  The  PMF  of  £'tx(^)  is 
determined  by  channel  fading  statistics  and  is  denoted  by 

Pn{k)  =VT{Et^{n)  =  £k],  k  =  l,...,L.  (2) 

More  specifically,  Pn(^)  is  the  probability  that  the  fading  condi¬ 
tion  in  channel  n  falls  into  an  interval  that  requires  a  minimum 
energy  of  £k  for  successful  transmission. 

Let  E(t)  denote  the  secondary  user’s  residual  energy  at  the 
beginning  of  slot  t.  Due  to  random  transmission  energy  con¬ 
sumption,  E{t)  is  also  a  random  variable  taking  values  from  a 
finite  set  E: 

A  r  ^  L 

S  —  \  6  .  6  — 6  ^  ^  Cj^Ej^  CgCg  Cp(^p  ^  O5 

I  k=l 

Cs  >  y^Cfe;cfe,C5,Cp  G  {0}  U  Z+;  I  (3) 
fc=i  J 

where  ,  Cg ,  Ck  are,  respectively,  the  numbers  of  slots  when  the 
secondary  user  switches  to  the  sleeping  mode,  senses  a  channel, 
and  transmits  at  the  kth  power  level.  Since  the  secondary  user  is 
required  to  sense  a  channel  before  accessing  it  in  order  to  avoid 
collisions  with  primary  users,  we  have  Cg  > 

Traffic  Model:  In  Sections  III-V,  we  adopt  a  continuous 
traffic  model,  i.e.,  the  secondary  user  always  has  packets  to 
transmit.  The  case  where  secondary  users  have  bursty  traffic  is 
considered  in  Section  VI. 

^Our  analysis  can  be  readily  extended  to  a  more  general  Markovian  fading 
channel  model.  See  details  in  Section  III-B. 

^An  interesting  variation  is  to  separate  the  energy  for  sensing  channel  occu¬ 
pancy  from  that  for  estimating  channel  fading  conditions;  the  latter  would  be 
consumed  only  if  the  channel  is  sensed  to  be  idle.  This  variation  is  easily  incor¬ 
porated  into  the  framework  developed  here. 
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III.  A  Decision-Theoretic  Framework 

In  this  section,  we  formulate  the  optimal  energy-constrained 
OSA  design  as  a  POMDP.  This  formulation  allows  us  to  incor¬ 
porate  the  secondary  user’ s  residual  energy  into  sensing  and  ac¬ 
cess  decisions  at  the  MAC  layer.  We  show  that  the  optimal  en¬ 
ergy-constrained  OSA  strategy  is  given  by  the  optimal  policies 
of  this  POMDP. 

A.  Sequential  Decision-Making 

Sensing  Decision:  At  the  beginning  of  slot  t,  based  on  its 
statistical  knowledge  of  the  SOS  and  its  current  residual  energy, 
the  secondary  user  first  determines  its  operating  mode  in  this 
slot:  sleeping  or  sensing.  If  the  sleeping  mode  is  chosen,  no 
more  decisions  need  to  be  made  in  this  slot.  Otherwise,  the  user 
chooses  a  channel  n  to  sense.  Let  0  represent  the  sleeping  mode. 
We  define  sensing  action  a{t)  as 

a{t)  e  {O(sleeping),  1, . . . ,  N}.  (4) 

Sensing  Observation:  Suppose  that  the  user  has  decided  to 
sense  channel  a{t)  G  N}  in  this  slot.  Then,  the  user 

observes  the  occupancy  state  and  the  fading  condition  of  this 
channel  (see  Section  IV-E  for  implementation  details).  Com¬ 
bining  these  two  observations,  we  define  sensing  outcome  ©(f) 
as 

Q{t)  e  {0  (busy),  1, . . . ,  i}  (5) 

where  0(f)  =  0  indicates  that  the  chosen  channel  is  busy,  and 
0(f)  =  /c  >  0  indicates  that  the  chosen  channel  is  idle  and  the 
fading  condition  requires  the  user  to  transmit  at  the  ki\\  power 
level. 

Given  S(f)  =  s  G  <S,  the  conditional  PMF  of  sensing  out¬ 
come  ©(f)  for  channel  a(f)  >  0  is  given  by 

C/,(fc|s)  =  Pr{0(t)  =  fc|S(i)  =  s} 

Pa{k),  if  Sa{t)  =  1,  A:  >  0 
1,  if  Sa{t)  =  0,  A:  =  0  (6) 

0,  otherwise 

where  Pa{k)  is  determined  by  channel  fading  statistics,  and  is 
defined  in  (2). 

Access  Decision:  After  observing  0(f)  from  the  chosen 
channel,  the  user  determines  whether  to  access.  Let  ^k{t) 
denote  the  access  decision  given  0(f)  =  k\ 


where  Sk  is  the  energy  required  for  a  successful  transmission 
under  the  current  sensing  outcome  0(f)  =  k.  Hence,  access  de¬ 
cisions  ^(f)  =  [^0(^)5  ^1(^)5  •  •  •  5  for  different  sensing 

outcomes  should  be  chosen  from  the  composite  set  p(e|a): 

p(e|a)  =  ^(e|a,  0)  x  •  •  •  x  p(e|a,  L).  (9) 

At  the  end  of  the  slot,  the  user  updates  its  statistical  knowl¬ 
edge  of  the  SOS  by  incorporating  its  decisions  and  observa¬ 
tions  in  this  slot  (see  Section  III-B  for  details).  Depending  on 
its  sensing  and  access  decisions,  the  user’s  residual  energy  is 
reduced  from  E{t)  =  e  to 

^  ^  \e  —  Cs  —  if  >  0  and  0(f)  =  k. 

(10) 

Note  that  when  0(f)  =  0,  the  user  should  not  access  (^o(f)  = 
0);  its  residual  energy  is  reduced  to  E{t  -h  1)  =  e  —  Cg.  The  up¬ 
dated  SOS  statistics,  together  with  the  reduced  residual  energy 
E'(f  -h  1),  are  then  used  by  the  user  to  make  optimal  decisions 
in  slot  f  -h  1.  The  above  procedure  repeats  until  the  secondary 
user  is  incapable  of  successful  transmission  under  any  channel 
fading  conditions,  i.e.,  E{t)  <  Cg  si. 

B.  A  POMDP  Formulation 

We  show  that  the  sequential  decision-making  process  de¬ 
scribed  above  can  be  formulated  as  a  POMDP.  Specifically, 
the  system  state  is  characterized  by  the  SOS  of  the  primary 
network  S(f)  and  the  residual  energy  E{t)  of  the  secondary 
user.3  While  the  residual  energy  is  fully  observable  to  the  user, 
the  current  SOS  of  the  primary  network  cannot  be  directly 
observed  due  to  partial  spectrum  monitoring.  We  thus  have  a 
POMDP  with  a  random  horizon  determined  by  the  stopping 
time  T  =  min{f  >  0  :  E{t)  <  -h  ei}. 

Sufficient  Statistics:  At  the  beginning  of  slot  f,  the  user’s  sta¬ 
tistical  knowledge  of  the  SOS  is  provided  by  its  decision  and 
observation  history  H{t)  =  {a(T),  0(t)}*l\.  As  shown  in 
[15],  a  sufficient  statistic  for  the  SOS  is  given  by  a  belief  vector 
A(f)  =  {As(f)}sG<s  of  size  2^,  where  each  element  As(f)  rep¬ 
resents  the  conditional  probability  (given  the  decision  and  ob¬ 
servation  history  H{t))  that  the  SOS  is  given  by  S(f)  =  s,  i.e., 

As(A)  =  Pr{S(A)  =  s|Pr(A)}.  (11) 


^fc(f)  ^  {0  (ro  access),  1  (access)}.  (7) 


Note  that  to  avoid  collisions  with  primary  users,  the  user 
should  refrain  from  transmission  when  the  channel  is  sensed 
as  busy:  ^o(^)  =  0  (note  that  from  (6),  C4(0|0)  =  1).  Fur¬ 
thermore,  the  user  should  not  access  when  its  residual  energy 
is  insufficient  for  accessing  the  channel  in  the  current  fading 
condition.  With  the  above  in  mind,  we  define  a  set  p(e|a,  k)  of 
admissible  access  decisions  when  the  user  has  residual  energy 
E{t)  =  e  and  obtains  sensing  outcome  0(f)  =  k  at  channel 
a{t)  >  0: 


p(e|a,  k) 


A 


{0},  if  A;  =  0  or  e  <  Cg  -h 
{0,1},  otherwise 


(8) 


At  the  beginning  of  slot  f -h  1,  the  belief  vector  A(f -h  1)  can  be 
obtained  from  A(f)  by  incorporating  the  sensing  decision  a{t) 
and  possibly  the  observation  0(f)  in  slot  f.  Specifically,  when 
the  user  chooses  to  operate  in  the  sleeping  mode  (a(f)  =  0), 
no  observation  is  made,  and  the  belief  vector  is  updated  based 
solely  on  the  underlying  Markovian  model  of  the  primary  traffic: 

A(f  -h  1)  =  {As(f  +  l)}sG<S 

=  r(A(f)|0) 

^If  a  Markovian  fading  model  is  adopted,  the  system  state  should  also  in¬ 
clude  the  fading  conditions  C  =  [Gi(f), .  .  .  ,  Cjv(f)],  where  Cn(t)  represents 
the  current  fading  condition  of  channel  n.  Due  to  partial  spectrum  monitoring, 
fading  conditions  C  are  also  partially  observable. 
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where 

Xs{t  +  l)=Y,\,.{t)Ps{s\s').  (12) 

s'es 


the  Structure  of  the  optimal  solution  and  describe  an  efficient  al¬ 
gorithm  for  obtaining  the  optimal  decisions.  At  the  end  of  this 
section,  we  discuss  the  distributed  implementation  of  the  op¬ 
timal  MAC  design. 


When  the  user  chooses  a  channel  a{t)  >  0  to  sense,  the  belief 
vector  can  be  updated  using  Bayes  rule  based  on  the  sensing 
outcome  ©(t)  =  k: 

A(t-h  1)  =  T(A(t)|a,  k), 

where 


^  Xs>{t)Ps{s\s')U,{k\s') 

Es'g5  VWCi(A;|s')  ■ 


(13) 


The  belief  vector  A(t)  together  with  the  residual  en¬ 
ergy  E{t)  =  e  is  a  sufficient  statistic^  for  the  system  state 
(S(t),  E'(f)).  That  is  (A(f),e),  referred  to  as  the  information 
state,  is  sufficient  for  making  optimal  sensing  and  access 
decisions.  A  sensing  policy  tt^  is  thus  given  by  a  sequence  of 
functions  tt^  =  [/xi, /X25  •  •  •].  where  fit  maps  an  information 
state  (A,  e)  to  a  sensing  decision  a{t)  G  {0, 1, ... ,  N}  in  slot 
t.  Given  sensing  policy  tTs,  an  access  policy  tTc  is  given  by  a 
sequence  of  functions  tTc  =  [ui ,  U2 ,  •  •  •] ,  where  Ot  maps  an 
information  state  (A,  e)  satisfying  a{t)  =  >  0 

the  user  operates  in  the  sensing  mode)  to  an  admissible  access 
decision  ^(t).  If  functions  are  identical  for  all  t,  7rs(7rc) 

is  a  stationary  policy. 

Reward  and  Objective:  A  natural  definition  of  the  reward  is 
the  number  of  bits  delivered  by  the  user  in  a  slot.  The  immediate 
reward  R{f)  can  thus  be  written  as 


R{t)  = 


0,  a(f)  =  0 

ga{Ba)^k{t),  alt)  >  0,  ©(t)  =  k 


(14) 


where  ga{‘)  is  3.  given  function  of  the  channel  bandwidth  Ba, 
determined  by  the  modulation  and  coding  scheme  used  by  the 
user.  For  simplicity,  we  assume  ga{Ba)  =  Ba- 

The  expected  total  reward  of  this  POMDP  over  a  random  time 
horizon  represents  the  expected  total  number  of  bits  delivered 
by  the  user  during  its  battery  lifetime.  The  optimal  sensing  and 
access  policies  are  thus  given  by 


{tt*  ,  tt*  }  =  arg  max  E 

TTs, TTc 


Lt=i 


A{l),E{l)=e 


(15) 


where  the  initial  belief  vector  A(l)  can  he  set  to  the  stationary 

O 

distribution  A  of  the  SOS  if  no  information  about  the  initial  state 
is  available. 


IV.  Optimal  Energy-Constrained  OSA  Design 

In  this  section,  we  tackle  the  optimal  MAC  design  for  en¬ 
ergy-constrained  OSA  defined  in  (15).  We  first  show  that  the 
optimal  sensing  and  access  policies  {tt*,  tt*}  are  stationary  and 
then  derive  recursive  formulas  for  solving  (15).  We  also  show 

^If  a  Makovian  fading  model  is  adopted,  the  sufficient  statistic  consists  of 
three  parameters:  the  belief  vector,  the  residual  energy,  and  the  conditional  dis¬ 
tribution  (given  the  decision  and  observation  history)  of  the  fading  conditions 

C. 


A.  Stationary  Optimal  Policy 

Stationary  policies  are  usually  preferred  due  to  their  reduced 
memory  requirements  and  low  complexity  in  implementation. 
The  fact  that  the  user  consumes  nonzero  energy  in  each  slot  and 
that  its  battery  has  finite  initial  energy  implies  that  the  system  al¬ 
ways  reaches  a  terminating  state  (i.e.,  E{t)  <  Cs  +  si)  in  a  finite 
but  random  time.  The  inevitable  termination  makes  the  energy- 
constrained  OSA  design  an  example  of  a  stochastic  shortest  path 
problem,  which  always  has  a  stationary  optimal  policy  [13]. 

Proposition  1:  For  the  energy -constrained  OSA  design  given 
by  (15),  there  exist  stationary  optimal  sensing  and  access 
policies. 

Proof:  See  Appendix  A.  ■ 

B.  Value  Function 

Proposition  1  allows  us  to  focus  on  stationary  policies  without 
losing  optimality.  We  can  thus  omit  the  time  index  t  for  nota- 
tional  convenience.  The  next  step  to  solving  (15)  is  to  express 
the  objective  explicitly  as  a  function  of  the  information  state 
(A,  e)  and  the  sensing  and  access  actions  {a,  ^}. 

Let  Qa(A,  e)  denote  the  action-value  function  or  the  Q-func- 
tion,  which  represents  the  maximum  expected  total  reward  that 
can  be  obtained  by  taking  sensing  action  a  G  {0,...,A}in 
the  current  slot  when  the  information  state  is  (A,  e).  The  value 
function,  denoted  by  V(A,e),  is  the  maximum  expected  total 
reward  that  can  be  accumulated  starting  from  information  state 
(A,  e).  The  value  function  V(A,  e)  and  the  corresponding  op¬ 
timal  sensing  action  a*  (A,  e)  are  given  by 

V(A,e)=  max  Qa(A,e), 

aG{0,l,...,7V} 

a*(A,e)  =  arg  max  Qa(A,  e).  (16) 

Since  no  reward  will  be  earned  after  the  user’s  residual  energy 
E(f)  drops  below  the  minimum  energy  requirement  +  £1, 
we  have  V(A,  e)  =  0  for  all  information  states  (A,  e)  with 
e  <  -h  El. 

Next,  we  derive  iterative  formulas  for  calculating  the  value 
function  V (A,  e)  and  the  action-value  functions  Qa(A,  e). 

1)  Sleeping  Mode:  In  the  sleeping  mode  (a  =  0),  the  user 
consumes  energy  and  no  reward  will  be  earned  in  this  slot. 
The  action- value  function  Qo(A,  e)  is  thus  given  by  the  max¬ 
imum  expected  remaining  reward  from  the  next  slot: 

Qo(A,e)  =  V(r(A|0),e-e^)  (17) 

where  T(A|0)  is  the  updated  belief  vector  given  in  (12). 

2 )  Sensing  Mode:  If  the  user  chooses  channel  a  >  0  to  sense, 
it  will  observe  a  sensing  outcome  G  =  k  with  probability 

Oa{k)  ^  Pr{©  =  A;|A}  =  E  ^sUa{k\s)  (18) 

where  Ua{k\s)  is  the  conditional  observation  probability  given 
in  (6). 
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Given  sensing  outcome  ©  =  A;  at  the  chosen  channel  a, 
we  can  calculate  the  conditional  maximum  expected  reward 
ej/c,  achieved  by  adopting  an  admissible  access 
decision  Specifically,  Qa{A^e\k^^k)  consists  of  two 
parts:  (i)  the  immediate  reward  obtained  in  this  slot,  which  is 
given  by  (14);  (ii)  the  maximum  expected  remaining  reward 
y(A',  e')  starting  from  the  updated  information  state  (A',  e'), 
where  A'  =  T(A|a, /c)  given  in  (13)  represents  the  updated 
knowledge  of  the  SOS  after  incorporating  sensing  action  a  and 
observation  G  =  k,  and  e'  =  e  —  eg  —  is  the  reduced 
residual  energy.  We  arrive  at 

Qa(A,e|fc,$A:)  =  Ba^k+V{T{A\a,k),e-es-^k£k)-  (19) 

Optimizing  over  all  admissible  access  decisions  and  then 
averaging  over  sensing  outcomes  0  =  /c,  we  obtain  the  max¬ 
imum  expected  reward  achieved  by  choosing  channel  a  >  0 
and  the  corresponding  optimal  access  decision  ^^(A,  e|a)  as 

L 

Qa{-^,e)  =  y^Oa{k)  max  Qa{A,e\k,^k) 

^  ^kee(e\a,k) 

$fc(A,e|a)  =  arg  max  Qa(A,e\k,^k)-  (20) 


C  Solution  Structure 

We  note  that  obtaining  the  optimal  sensing  and  access  de¬ 
cisions  hinges  on  the  computation  of  the  action- value  and  the 
value  functions.  We  thus  seek  structures  of  the  value  function 
that  lead  to  efficient  computation  of  the  optimal  decisions. 

1 )  Reduced  Dimension:  One  of  the  difficulties  in  calculating 
the  value  function  y (A,  e)  is  that  the  dimension  of  the  belief 
vector  A  grows  exponentially  with  the  number  N  of  channels. 
It  has  been  shown  in  [4]  and  [23]  that  for  independently  evolving 
channels,  an  alternative  sufficient  statistic  for  the  SOS  is  given 
by  the  marginal  distribution  11(f)  =  [a;i(f), . . . ,  UN{t)]  of  the 
SOS,  where  uJn{t)  denotes  the  probability  (conditioned  on  the 
entire  decision  and  observation  history  H{t))  that  channel  n  is 
idle  at  the  beginning  of  slot  t\ 

^  Pr{S^{t)  =  l\H{t)}.  (21) 

Let  a  =  [gi,  . . . ,  a^]  and  (3  =  [/?i, . . . ,  denote  the 
transition  probabilities  of  channel  n,  where  =  Pr{*S'n(f)  = 
-  1)  =  0}  and  /3„  =  Pr{Sn{t)  =  -  1)  =  1}. 

We  can  then  obtain  the  belief  updates  similar  to  (12)  and  (13). 
Specifically,  when  the  user  operates  in  the  sleeping  mode,  we 
have 


n{t  + 1)  =  f{n{t)\o) 

where 


+  1)  =  Gn  +  (/?n  “  Gn)a;n(f).  (22) 

When  the  user  chooses  channel  a{t)  >  0,  then  the  belief  vector 
ri(f -h  1)  is  updated  according  to  the  sensing  outcome  0(f)  =  k: 

-h  1)  =  T(H(f)|a,  k) 


where 


{C^n  +  iPn  -  an)uJn{t)  if  77,  7^  a(f) 

/?,  if  77,  =  a(f),  A;  >  0 

G,  if  77,  =  a(f),  A;  =  0. 

(23) 

Following  Section  IV-B,  we  can  also  develop  a  simpler  recur¬ 
sion  for  the  value  function  12(11,  e): 

V(ft,e)=  max  Qa(ft,e)  (24) 

aG{0,l,...,7V} 


where 


Qo{^l,e)  =  V{f{n),e-ep) 

QaiPt,  e)  =  (1  -  Wa)V (T 0),  e  -  e*) 

L 

+  u}a'S^Pa{k)  max  Qa{Pl,e\k,^k), 

T>A.G^(e|a,fc) 

+  y(T(17|a,fc),e  -  e*  -  a>0. 


(25a) 


a>0 

(25b) 

(25c) 


Compared  with  the  original  value  function  12(A,  e)  developed 
in  Section  IV-B,  the  above  value  function  not  only  has 

simpler  belief  updates  T  but  also  avoids  computation  of  the 
summation  in  (18). 

2 )  Monotonicity:  Monotonicity  results  for  the  value  function 
are  usually  desired  since  they  not  only  provides  insights  into 
the  underlying  problem  but  also  serves  as  a  stepping  stone  for 
establishing  the  structure  of  optimal  policies  (see  [14]  for  an 
example).  In  Proposition  2,  we  study  the  monotonicity  of  the 
value  function  with  respect  to  each  of  its  parameters. 

Proposition  2:  Monotonicity  of  Value  Function 
P2.1)  The  value  function  V  (A,  e)  is  monotonically  increasing 
with  the  residual  energy  e  ^  E,  i.e.,  V (A,  e)  >  V (A,  e') 
for  e  >  e' . 

P2.2)  Assume  that  the  SOS  evolves  independently  across  chan¬ 
nels.  If  P  >  OL,  then  the  value  function  12(H,  e)  given  in 
(24)  is  monotonically  increasing  with  the  belief  vector 
a  i.e.,  V{n,e)  >  V{Ct\e)forn  >  H'. 

Proof:  See  Appendix  B.  ■ 

P2.1)  is  straightforward.  P2.2)  considers  the  case  where  the 
SOS  evolves  independently  across  channels.  It  provides  a  suf¬ 
ficient  condition  for  the  value  function  12(11,  e)  to  be  mono¬ 
tonically  increasing  with  the  belief  vector  ft  defined  in  (21). 
Note  that  (3  >  a  represents  the  case  where  the  channel  occu¬ 
pancy  state  is  positively  correlated  across  time.  In  this  case,  a 
larger  current  belief  vector  ft(t)  indicates  a  larger  probability 
that  channels  will  be  idle  in  all  the  future  slots,  leading  to  a 
higher  chance  of  getting  rewards.  When  (3  <  ol,  the  channel 
occupancy  state  is  negatively  correlated  across  time.  The  value 
function  is  not  necessarily  monotonic.  This  is  because  when 
P  <  a,  3.  larger  belief  vector  ft(t)  indicates  a  smaller  proba¬ 
bility  that  channels  are  idle  in  the  next  slot.  The  probabilities  of 
channels  being  idle  oscillates  over  time. 
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Fig.  I.  An  illustration  of  the  structure  of  the  value  function  V(\,e).  Each 
point  on  the  x  axis  represents  a  possible  belief  vector  A.  Each  arrow  repre¬ 
sents  the  update  of  an  information  state  (A,  e)  under  certain  decisions  and 
observations. 


3 )  Piecewise  Linearity  and  Convexity:  It  has  been  shown  in 
[15]  that  the  value  function  for  a  POMDP  over  a  finite  and  fixed 
time  horizon  is  piecewise  linear  and  convex  with  respect  to  the 
belief  vector.  In  Proposition  3,  we  show  that  the  value  function 
V (A,  e)  for  a  POMDP  over  a  finite  but  random  time  horizon 
also  has  this  property. 

Proposition  3:  Piecewise  Linear  and  Convex  Value 
Function 

The  value  function  V(A,  e)  given  in  (16)  is  piecewise  linear 
and  convex  with  respect  to  the  belief  vector  A  G  11.  That  is,  for 
a  given  residual  energy  e  ^  E,  the  value  function  V(A,  e)  can 
be  written  as 

y(A,e)  =  m^(A,T)  (26) 

where  (•,  •)  denotes  inner  product,  T  is  a  vector  of  size  |<S|  = 
2^ ,  and  Fg  is  a  finite  set  of  such  vectors  T. 

Proof:  The  proof  proceeds  by  mathematical  induction  on 
the  residual  energy  e.  See  Appendix  C.  ■ 

As  illustrated  in  Fig.  1,  Proposition  3  shows  that  the  domain 
of  the  value  function  V (A,  e)  can  be  partitioned  into  a  finite 
number  of  convex  regions,  each  of  which  is  associated  with 
an  T -vector  G  Fg.  The  value  function  of  a  certain  be¬ 
lief  vector  A  is  simply  given  by  the  inner  product  of  this  be¬ 
lief  vector  and  the  T-vector  associated  with  the  region  where 
A  lies.  For  the  example  in  Fig.  1,  the  value  function  of  A{t) 
is  given  by  V(A(t),e)  =  (A(t),Tg).  Hence,  calculating  the 
value  function  over  the  entire  continuous  belief  space  is  equiva¬ 
lent  to  finding  a  finite  set  Fg  of  T-vector s.  Readers  are  referred 
to  [16]-[20]  for  different  dynamic  programming  algorithms  for 
constructing  T -vectors. 

D.  A  Solution  Procedure 

At  the  beginning,  the  secondary  user  may  not  have  any  infor¬ 
mation  about  the  SOS  other  than  its  transmission  probabilities 


Ps  •  Hence,  the  initial  belief  vector  A(l)  is  usually  set  to  the  sta- 

o 

tionary  distribution  A  of  the  SOS.  We  note  that  given  an  initial 
belief  vector  and  an  initial  energy,  the  secondary  user  can  only 
experience  a  finite  number  of  possible  information  states  (A,  e) 
during  its  battery  lifetime.  This  is  due  to  the  fact  that  a  belief 
vector  A(f)  in  a  slot  can  only  transit  to  a  finite  number  of  pos¬ 
sible  belief  vectors  A(t  +  1)  in  the  next  slot  (see  Fig.  1),  and 
that  the  POMDP  given  in  (15)  terminates  in  a  finite  time  (see 
Section  IV- A).  The  above  observation  suggests  that  to  obtain 
optimal  sensing  and  access  decisions  for  a  given  initial  infor¬ 
mation  state,  we  only  need  to  calculate  the  value  function  for  a 
finite  number  of  possible  information  states. 

Also  note  that  due  to  energy  consumption  in  sleeping  and 
sensing,  the  user’s  residual  energy  decreases  after  each  slot. 
Hence,  the  value  function  and  the  action-value  function  of 
an  information  state  (A,e)  only  depend  on  those  with  less 
residual  energies.  We  can  thus  compute  the  value  function  in 
an  increasing  order  of  the  residual  energy  e  G  E,  which  leads 
to  the  following  algorithm  for  computing  the  optimal  sensing 
and  access  decisions. 


Algorithm  for  Computing  Optimal  Sensing  and 
Access  Decisions 


50)  According  to  the  initial  belief  vector  A  ( 1 )  and  the  initial 
battery  energy  e,  enumerate  all  possible  information 
states  (A,e)  that  the  user  may  experience  during  its 
battery  lifetime.  Let  U  include  all  such  (A,  e)  with 

^  ^  +  ^1- 

51)  Let  V(A,  e)  =  0  for  all  (A,  e)  with  e  G  E  and  e  < 

Cs  +  £l. 

52)  Use  (16),  (17),  (19),  and  (20)  to  calculate  the  value 
function  for  the  information  state  (A,  e)  ell  satisfying 
e  <  e'  for  all  (A',  e')  G  U. 

53)  Remove  (A,  e)  from  set  U\  i.e.,  U  =  Z7\(A,  e).  If 
U  is  nonempty,  then  goto  S2).  Otherwise,  stop  the 
calculation. 


We  point  out  that  the  optimal  sensing  and  access  decisions  for 
all  possible  information  states  can  be  precomputed  and  stored 
by  each  user  before  it  operates.  At  the  beginning  of  each  slot, 
the  user  simply  looks  up  the  optimal  decisions  using  its  current 
information  state  (A,  e).  Hence,  the  proposed  optimal  OS  A  de¬ 
sign  does  not  impose  any  computational  burden  on  the  user. 

E.  Distributed  Implementation 

Next,  we  show  that  the  optimal  energy-constrained  OSA 
strategy  obtained  under  the  POMDP  framework  can  be  imple¬ 
mented  in  a  distribution  fashion. 

1 )  Channel  State  Acquisition:  Suppose  that  the  transmitter 
and  the  receiver  hop  to  the  same  channel  at  the  beginning  of 
a  slot.  If  the  channel  is  sensed  as  idle,  the  transmitter  adopts 
carrier  sensing  (i.e.,  wait  for  a  random  backoff  time  before 
transmission  attempts)  to  avoid  collisions  among  competing 
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secondary  users. ^  If  the  channel  remains  idle  when  its  backoff 
time  expires,  it  transmits  a  short  request-to-send  (RTS)  mes¬ 
sage  at  full  power^  to  the  receiver.  Upon  receiving  the  RTS, 
the  receiver  estimates  the  channel  fading  condition  using  the 
RTS,  and  then  replies  with  a  clear- to-send  (CTS)  message 
which  contains  the  estimated  channel  fading  condition.  After  a 
successful  exchange  of  RTS-CTS,  the  transmitter  can  adjust  its 
transmission  power  according  to  the  channel  fading  condition 
and  communicate  with  the  receiver  over  this  channel. 

2 )  Synchronous  Hopping:  Suppose  that  the  transmitter  and 
the  receiver  have  tuned  to  the  same  channel  after  the  initial  hand¬ 
shake  (one  scheme  for  initial  handshake  can  be  found  in  [4] 
and  [23]).  To  ensure  synchronous  hopping  in  the  spectrum  af¬ 
terwards  without  extra  control  message  exchange,  the  receiver 
must  be  aware  of  the  transmitter’ s  sensing  decisions  at  the  be¬ 
ginning  of  each  slot.  For  this  purpose,  the  transmitter  and  the  re¬ 
ceiver  must  maintain  the  same  information  state  (A,  e)  in  each 
slot. 

We  point  out  that  when  both  the  transmitter  and  the  receiver 
can  observe  the  true  state  of  the  sensed  channel,  they  will 
have  the  same  update  of  the  belief  vector  and  the  residual 
energy,  thus  reaching  the  same  information  state.  When  the 
transmitter  and  the  receiver  are  affected  by  different  sets  of 
primary  users  or  when  sensing  errors  cannot  be  ignored,  the 
exchange  of  RTS-CTS  for  fading  state  acquisition  can  be  ex¬ 
ploited  to  ensure  synchronous  hopping  between  the  transmitter 
and  the  receiver.  In  this  case,  the  common  observation  used  for 
updating  the  information  state  is  whether  there  is  a  successful 
exchange  of  RTS-CTS.  A  similar  discussion  on  using  common 
observations  to  ensure  synchronous  hopping  can  be  found  in 
[5]  and  [24]. 

V.  Threshold  Structures  oe  Optimal  Policies 

In  this  section,  we  study  different  factors  that  affect  the  op¬ 
timal  decisions  obtained  in  Section  IV-B.  We  focus  on  the  oper¬ 
ating  decision  (sleeping  versus  sensing)  and  the  access  decision, 
which  are  unique  to  the  energy-constrained  OSA  problem. 

Careful  inspection  of  (17)  and  (19)  reveals  that  the  user’s  de¬ 
cision  affects  the  total  expected  reward  in  three  ways:  i)  it  may 
acquire  an  immediate  reward  Ba\  ii)  it  transforms  the  current 
belief  vector  A(t)  to  A(f  -h  1)  =  T (A|0)  or  T (A|a,  k)  which 
summarizes  the  information  of  the  SOS  up  to  this  slot;  and  iii) 
it  causes  a  reduction  in  battery  energy.  Hence,  to  maximize  the 
total  expected  reward  during  the  battery  lifetime,  optimal  de¬ 
cisions  should  be  made  to  achieve  a  tradeoff  among  gaining  in¬ 
stantaneous  reward,  gaining  information  for  future  use,  and  con¬ 
serving  energy. 

A.  To  Sense  or  Not  to  Sense? 

Without  the  energy  constraint,  the  user  should  always  operate 
in  a  sensing  mode  since  sensing  provides  not  only  a  chance  to 

^Carrier  sensing  among  secondary  users  starts  only  after  the  state  of  the 
chosen  channel  has  been  identified  as  idle.  In  other  words,  a  minimum  value 
on  the  backoff  time  of  secondary  users  is  imposed  to  ensure  the  priority  of  the 
primary  users.  This  minimum  value  of  the  backoff  time  also  allows  a  secondary 
user  to  distinguish  transmissions  of  primary  users  from  those  of  competing 
secondary  users. 

^Note  that  the  energy  consumed  in  channel  state  estimation  is  absorbed  into 
the  sensing  energy  consumption  e^. 


gain  immediate  access  but  also  statistical  information  about  the 
SOS.  With  the  energy  constraint,  however,  the  user  may  choose 
to  sleep  since  sensing  costs  energy.  In  this  case,  the  optimal 
operating  decision  should  strike  a  balance  between  gaining  re¬ 
ward/information  and  conserving  energy. 

1 )  Analytical  Study:  We  first  provide  a  sufficient  condition 
for  the  user  to  operate  in  the  sensing  mode. 

Proposition  4:  When  the  secondary  user’s  belief  vector  is 

o 

given  by  the  stationary  distribution  A  of  the  underlying  SOS,  its 

o 

optimal  operating  mode  is  to  sense,  i.e.,  a*  (A,  e)  >  0  ifK  =A. 

Proof:  See  Appendix  D.  ■ 

The  intuition  behind  Proposition  4  is  explained  as  follows. 
Suppose  that  the  secondary  user  chooses  to  operate  in  the 
sleeping  mode  when  its  belief  vector  is  given  by  the  stationary 
distribution  of  the  SOS.  Then,  it  will  have  the  same  belief 
vector  but  reduced  residual  energy  at  the  beginning  of  the  next 
slot.  The  energy  consumed  in  sleeping  is  thus  wasted  without 
gaining  any  statistical  information  about  the  SOS.  This  suggests 
that  the  optimal  operating  mode  is  to  sense. 

Next,  we  consider  the  single-channel  (N  =1)  case,  where 
the  belief  vector  A  can  be  characterized  by  a  scalar  uj  as  defined 
in  (21),  and  the  transition  probabilities  of  this  channel  can  be 
denoted  by/?  =  Pr{6'(t-hl)  =  l|*S'(t)  =  l}andG  =  Pr{6'(t-h 
1)  =  i\S{t)  =  0}. 

Proposition  5:  Threshold  Optimal  Sensing  Decision 

Consider  the  single -channel  (N  =  1)  case.  For  any  given 
residual  energy  e,  the  optimal  sensing  decision  has  a  threshold 
structure: 


f  1  (sense),  if  a;  >  rth(e) 
^  0  (sleep) ,  otherwise 


(27) 


where  rth(e)  G  [min{G,/?},  (g)/(1  -\-  ol  —  0)]  is  the  optimal 
sensing  threshold. 

Proof:  See  Appendix  D.  ■ 

Proposition  5  states  that  the  user  should  sense  when  the  belief 
u  of  the  channel  is  large  and  should  sleep  when  the  channel  is 
less  likely  to  be  idle.  This  agrees  with  our  intuition. 

Corollary  1:  Consider  the  single -channel  (N  =  1)  case. 
When  a  =  0  the  secondary  user  should  always  operate  in  the 
sensing  mode,  i.e.,  a*(a;,e)  =  1. 

Proof:  See  Appendix  D.  ■ 

2)  Numerical  Study:  As  indicated  by  Proposition  5,  the 
user’s  residual  energy  e  affects  the  optimal  operating  decision 
through  sensing  threshold  rth(e).  To  study  the  role  of  the 
residual  energy  e  in  choosing  operate  modes,  we  plot  the 
optimal  sensing  threshold  rth(e)  in  Fig.  2  for  different  sensing 
energy  consumption  Cg  and  channel  occupancy  statistics 

We  find  that  the  optimal  sensing  threshold  rth(e)  is  highly 
dependent  on  the  user’s  residual  energy  e  when  e  is  small.  As  e 
increases,  the  impact  of  the  residual  energy  e  on  the  user’s  op¬ 
erating  decision  rth(e)  diminishes.  When  e  is  sufficiently  large, 
the  optimal  sensing  threshold  rth(e)  becomes  independent  of 
the  residual  energy  e.  This  observation  implies  that  when  the 
battery  is  depleting,  the  user  should  focus  more  on  how  to  fully 
utilize  its  residual  energy. 
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Fig.  2.  Optimal  sensing  threshold  rth(e).  B  —  1  (bandwidth),  =  0.3,  0.5 
(sensing  energy),  Cp  =  0.1  (sleeping  energy),  £^tx  C  {1,2,  3,  4}  (trans¬ 
mission  energy),  W  =  1  (number  of  channels),  [p(l),p(2),p(3),p(4)]  = 
[0.2,  0.3,  0.3,  0.2]  (channel  fading  statistics). 


Fig.  3.  Optimal  access  thresholds  fcth(A,e).  B  —  1  (bandwidth), 
13  =  0.7,  a  =  0.3  (transition  probabilities),  Cp  =  0.1  (sleeping  en¬ 
ergy),  £^tx  C  {1, 2, 3, 4}  (transmission  energy),  [p(l),p(2),p(3),p(4)]  = 
[0.2, 0.3,  0.3,  0.2]  (channel  fading  statistics). 


We  also  see  that  the  optimal  sensing  threshold  rth(e)  fluctu¬ 
ates  more  dramatically  when  the  channel  occupancy  state  is  neg¬ 
atively  correlated  (i.e.,  (3  <  a).  That  is,  in  this  case,  the  residual 
energy  plays  a  more  important  role  in  decision-making.  As  ex¬ 
plained  below  Proposition  2,  the  probability  that  the  channel  is 
idle  fluctuates  when  p  <  a.  Hence,  the  user  should  focus  more 
on  its  residual  energy  in  this  case  to  save  energy  for  those  slots 
when  the  channel  is  more  likely  to  be  idle. 

Furthermore,  we  And  that  the  optimal  sensing  threshold 
rth(e)  increases  with  the  sensing  energy  consumption  Cg.  That 
is,  the  user  should  be  more  conservative  in  making  operating 
decisions  when  is  large.  This  observation  agrees  with  our 
expectation  because  when  Cg  is  large,  the  extra  energy  con¬ 
sumed  in  sensing  can  only  be  paid  off  when  the  chance  of 
gaining  immediate  access  is  higher.  On  the  other  hand,  when 
Eg  is  comparable  to  e^,  the  user  can  afford  sensing  more  often 
to  gain  statistical  information. 

B.  To  Access  or  Not  to  Access? 


where  A:th(2V,  e)  G  {1^ . . .  ^  L}  is  the  optimal  access  threshold. 
Furthermore,  when  N  =  1  (i.e.,  the  single -channel  case),  the 
threshold  A:th(A,  e)  =  A:th(e)  is  independent  of  the  belief  vector 

A. 

Proof:  See  Appendix  E.  ■ 

Note  that  the  better  the  channel  fading  condition,  the  lower 
the  sensing  outcome.  Proposition  6  indicates  that  the  user  should 
access  when  the  channel  is  in  good  condition  and  not  access 
when  the  channel  experiences  deep  fading.  In  particular,  when 
the  sensed  channel  is  in  the  best  fading  condition  (i.e.,  G{t)  = 
1),  then  the  user  should  always  access,  i.e.,  (A,  e|a)  =  1,  for 

any  e  >  Cg  si. 

Proposition  6  also  helps  us  reduce  the  size  of  the  access  deci¬ 
sion  space  g{e\a)  from  exponential  0(2^)  to  linear  0{L)  with 
respect  to  the  number  L  of  power  levels,  leading  to  a  more  effi¬ 
cient  search  for  the  optimal  access  policy.  Specifically,  we  can 
restrict  our  search  for  the  optimal  access  decision  to  the  fol¬ 
lowing  set: 


Without  the  energy  constraint,  the  user  should  always  access 
an  idle  channel.  With  the  energy  constraint,  however,  the  access 
decision  should  take  into  account  both  the  energy  consumption 
characteristics  and  the  channel  fading  statistics.  For  example, 
when  the  channel  is  idle  but  has  poor  fading  condition,  should 
the  user  access  this  channel  to  gain  immediate  reward  or  wait 
for  better  channel  realizations  for  less  transmission  energy?  We 
And  that  such  a  decision  is  a  monotonic  function  of  the  channel 
fading  condition. 

1 )  Analytical  Study: 

Proposition  6:  Given  that  a  channel  a  G  A}  is 

sensed,  the  optimal  access  decision  is  monotonically  increasing 
with  the  channel  fading  condition.  Specifically,  for  any  given 
residual  energy  e  >  Cg  -\-  si,  the  optimal  access  decision  is 
given  by 


^fc(A,e|a) 


1,  if  0  <  A;  <  A:th(A,  e) 
0,  otherwise 


(28) 


Q'{e\a) 

=  {[$0,  ^1,  •  •  • ,  ^l]  ;  e  g{e\a,  A:),  >  •  •  •  >  ^l} 

(29) 


where  the  size  of  g'{e\a)  is  on  the  order  of  L. 

2 )  Numerical  Study:  For  simplicity,  we  consider  the  single¬ 
channel  case  (TV  =  1)  in  the  numerical  study.  As  shown  in 
Proposition  6,  the  optimal  access  threshold  A:th(A,  e)  in  this  case 
reduces  to  kth{e),  which  is  independent  of  the  belief  vector  A. 
In  Fig.  3,  we  plot  the  optimal  access  threshold  kt\^{e)  as  a  func¬ 
tion  of  the  residual  energy  e  for  different  sensing  energy  con¬ 
sumption  Eg. 

Similar  to  the  behavior  of  the  optimal  sensing  threshold 
rth(e),  the  optimal  access  threshold  kthie)  may  vary  consid¬ 
erably  when  e  is  small,  but  a  common  steady  value  is  reached 
when  e  is  sufficiently  large.  That  is,  the  impact  of  e  on  optimal 
access  decisions  diminishes. 
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TABLE  I 

A  Sample  Path  of  the  SOS  Evolution  and  the  Corresponding  Optimal  Sensing  and  Access  Decisions,  (e^  =  0.1,  —  0.5,  £^tx  C 

{l,2,3,4},[i?i(l),i?i(2),i?i(3),i?i(4)]  =  [0.4,0.3,0.2,0.1],b2(l),P2(2),i?2(3),i?2(4)]  =  [0.2, 0.3,  0.4,  0.1] .) 


Time  t 

1 

2 

3 

4 

5 

6 

7 

8 

9 

SOS  S(t) 

[1,0] 

[0,  1] 

[1,0] 

[1,  1] 

[0,  0] 

[1,  0] 

[0,  1] 

[1,0] 

[0,  1] 

Belief  vector 

[0.5,0.51 

[0.3,0.51 

[0.58,0.51 

[0.3,0.51 

[0.58,0.51 

[0.7,0.51 

[0.3,0.51 

[0.58,0.5] 

[0.3,0.51 

Residual  energy  E{t) 

8 

6.5 

6.4 

3.9 

3.8 

3.3 

2.8 

2.7 

0.2 

Sensing  decision  a{t) 

1 

0 

1 

0 

1 

1 

0 

1 

- 

Sensing  outcome  0(f) 

1 

- 

2 

- 

0 

3 

- 

2 

- 

Access  decision  $(f) 

1 

- 

1 

- 

0 

0 

- 

1 

- 

We  further  see  that  the  optimal  access  threshold  A:th(e)  in¬ 
creases  with  the  sensing  energy  consumption  .  Hence,  when 
Cs  is  small,  the  user  should  refrain  from  transmission  under  poor 
channel  conditions  and  wait  for  better  channel  realization.  On 
the  other  hand,  when  is  large,  the  user  should  be  more  ag¬ 
gressive  in  making  access  decisions:  it  should  grab  an  instan¬ 
taneous  opportunity  even  when  the  channel  is  in  a  deep  fade. 
This  is  because  when  is  large,  the  sensing  energy  consumed 
in  waiting  for  the  best  channel  realization  may  exceed  the  extra 
energy  consumed  in  combating  the  poor  channel  fading. 

C.  A  Sample  Path 

To  further  illustrate  the  behavior  of  the  optimal  sensing  and 
access  policies,  we  study  an  example  of  the  SOS  evolution  and 
the  corresponding  optimal  decisions.  In  Table  I,  we  consider 
N  =  2  independent  channels  with  identical  transition  proba¬ 
bilities  (a  =  0.7,/?  =  0.3)  but  different  channel  fading  statis¬ 
tics.  At  the  beginning  of  the  first  slot,  the  user  operates  in  the 
sensing  mode  since  its  belief  vector  is  given  by  the  stationary 
distribution  of  the  SOS.  This  agrees  with  Proposition  4.  We  find 
that  to  conserve  energy,  the  user  never  chooses  the  channel  (i.e., 
Channel  2)  in  deep  fading  even  if  there  is  a  higher  probability 
that  this  channel  is  idle.  This  demonstrates  the  important  role  of 
channel  fading  statistics  in  deciding  whether  to  sense.  We  also 
see  that  the  exploitation  of  channel  occupancy  dynamics  allows 
the  user  to  efficiently  track  spectrum  opportunities.  Specifically, 
when  the  channel  is  less  likely  to  be  idle,  the  user  operates  in  the 
sleeping  mode  to  save  energy  (see  t  =  2, 4, 7).  It  wakes  up  when 
the  probability  that  the  channel  is  idle  is  large. 

VI.  Bursty  Traffic  in  Energy-Constrained  OSA 

In  this  section,  we  address  the  optimal  distributed  MAC  de¬ 
sign  for  energy-constrained  OSA  when  the  secondary  user  has 
bursty  traffic.  We  show  that  in  this  case,  the  optimal  sensing 
and  access  decisions  should  also  take  into  account  the  traffic 
dynamics  of  the  secondary  user.  We  illustrate  the  impact  of  the 
secondary  user’ s  buffer  state  on  the  optimal  operating  decision. 

A.  Bursty  Traffic  Model 

We  assume  that  the  packet  arrival  process  is  i.i.d.  across  slots, 
for  example,  the  Poisson  packet  arrival  process.  Let  = 

0, 1, . . .,  denote  the  probability  that  m  packets  arrive  in  a  slot. 
We  assume  that  the  user  has  a  finite  buffer  with  maximum  size 
M.  It  receives  packets  in  every  slot  even  if  it  operates  in  the 


sleeping  mode.  Packets  are  dropped  when  its  buffer  overfiows. 
Let  D{t)  denote  the  number  of  packets  in  the  user’s  buffer  at 
the  beginning  of  slot  t.  Depending  on  the  packet  arrivals  and 
departures,  the  buffer  state  D{t)  follows  a  Markov  chain  with 
state  space  {0  (empty),  1, . . . ,  M}  and  transition  probabilities 
given  by 

PD{d'\d,i)  =  Pi{D{t  +  l)  =  d'\D(t)  =  d, 
i  packets  were  sent  in  slot  t} 

oo 

^  ^  Qrn^[d' =rnin{d—i-\-rri,M}]^ 
m=0 

dff'  G  {0,1,..., M}.  (30) 

We  assume  that  the  transmission  time  of  a  packet  over  a 
channel  with  unit  bandwidth  is  equal  to  the  slot  length.  Hence, 
the  number  i  of  packets  transmitted  over  channel  a  in  a  slot  is 
either  0  or  Ba- 

B.  POMDP  Formulation 

The  POMDP  framework  developed  in  Section  III-B  for 
energy-constrained  OSA  design  in  the  continuous  traffic  case 
can  be  extended  to  the  bursty  traffic  case.  Specifically,  the  new 
system  is  characterized  by  the  following  three  components: 
i)  the  primary  network’s  SOS  S(t);  ii)  the  secondary  user’s 
residual  energy  E{t)\  and  iii)  the  secondary  user’s  buffer  size 
D(t). 

Sufficient  Statistics:  As  explained  in  Section  IV-E,  to  ensure 
synchronous  hopping  in  the  spectrum  without  extra  control  mes¬ 
sages,  the  user  (i.e.,  transmitter)  and  its  receiver  must  use  a 
common  knowledge  of  the  system  state  for  decision-making  in 
each  slot.  We  note  that  while  the  user  and  its  receiver  can  main¬ 
tain  the  same  belief  vector  A(f)  and  residual  energy  E{t)  =  e, 
the  receiver  does  not  know  the  exact  buffer  state  D(t)  until  no¬ 
tified  by  the  user  during  the  exchange  of  RTS-CTS.'^  Hence, 
when  making  sensing  decisions  (which  occur  before  the  ex¬ 
change  of  RTS-CTS),  both  the  user  and  its  receiver  should  treat 
the  buffer  state  D{t)  as  a  partially  observable  parameter  and 
use  statistical  information  about  D(t).  On  the  other  hand,  since 
both  the  user  and  its  receiver  know  the  exact  buffer  state  after 
a  successful  exchange  of  RTS-CTS,  access  decisions  should  be 
made  by  taking  into  account  the  exact  buffer  state  D{t).  Let 
C  Q{e\a^kff)  denote  an  admissible  access  decision 

^The  secondary  user  can  piggyback  its  buffer  state  D[t)  to  the  RTS  message. 


Authorized  licensed  use  limited  to:  Univ  of  Calif  Davis.  Downloaded  on  February  6,  2009  at  21 :04  from  IEEE  Xplore.  Restrictions  apply. 


CHEN  et  al:  DISTRIBUTED  SPECTRUM  SENSING  AND  ACCESS  IN  COGNITIVE  RADIO  NETWORKS 


793 


under  sensing  outcome  ©(t)  =  k  and  buffer  state  D{t)  =  d, 
where 


g{e\a^  /c,  d) 


A 


{0}  if  k  =  0  or  e  <  Cs  Sk  or  d  =  0^ 
{0, 1}  otherwise. 

(31) 


The  statistical  information  about  the  buffer  state  D(t)  can  be 
summarized  by  a  conditional  PMF  ^{t)  =  where 

^  [0, 1]  and  =  1-  Each  element  de¬ 

notes  the  conditional  probability  (given  the  user’ s  notifications 
of  the  buffer  state)  that  the  user’s  buffer  state  is  D(t)  =  d  at 
the  beginning  of  slot  t.  When  the  user  operates  in  the  sleeping 
mode  (i.e.,  a{t)  =  0)  or  the  chosen  channel  is  sensed  as  busy 
{a{t)  >  O,0(t)  =  0),  the  user  is  unable  to  inform  the  receiver  of 
its  buffer  state.  Hence,  the  statistical  information  of  D{t) 
is  updated  at  both  the  user  and  its  receiver  based  solely  on  the 
packet  arrival  process: 

where 


M 

Mt+^)  =  i’d'{t)PD{d\d',0).  (32) 

d'=0 


where  jr(^|0)  is  the  updated  knowledge  of  the  buffer  state 
given  in  (32). 

Next,  we  derive  the  maximum  expected  reward  Qa(A,  e,  ^) 
that  can  be  achieved  in  the  sensing  mode.  Consider  the  sce¬ 
nario  where  the  secondary  user  chooses  access  decision  ^k,d  C 
p(e|a,  /c,  d)  under  sensing  outcome  G  =  k  when  its  buffer  state 
is  D  =  d.  In  this  case,  the  maximum  expected  reward  can  be 
calculated  as 


Qa{A,  e,  ^\k,  d,  ^k,d)  =  +  V{T (A|a,  k), 

e  -  e*  -  ^k,d£k,d^{'^\a,k,d,^k,d))-  (36) 

Optimizing  over  all  admissible  access  decisions  ^k,d  C 
g{e\a^k^d)  and  then  averaging  over  all  sensing  outcomes 
G  =  k  with  (18)  and  all  buffer  states  D  =  d  with  current 
statistical  information  we  obtain  that 

M  L 

Qa{A,e,^)  =  Y,^d'£Oa{k) 

d=0  k=0 

X  max  Qa{A,e,^\k,d,^k,d)-  (37) 

^k,d^Q(e\a,k,d) 

With  (35),  (36),  and  (37),  the  value  function  can  be  obtained  as 
y(A,e,^)=  max  Qa(A,e,^).  (38) 


When  a  channel  a{t)  >  0  is  sensed  as  idle,  the  receiver  knows 
the  buffer  state  D{t)  =  d  from  the  user’s  RTS  message.  The 
statistical  information  about  the  buffer  state  can  thus  be  updated 
based  on  the  user’s  sensing  decision  a{t)  and  access  decision 
^k,dd)  G  Q{e\a,k,d): 


We  can  readily  generalize  the  solution  procedure  described  in 
Section  IV-D  and  calculate  the  above  value  function  in  an  in¬ 
creasing  order  of  the  residual  energy  e  starting  from  e  <  eg+si. 
After  computing  the  value  function,  we  can  obtain  the  optimal 
sensing  and  access  decisions  as 


+  1)  =  P'{'^{t)\a,  k,  d,  ^k,d) 

where 

i;d'{t  +  l)  =  PD{d'\d,^k,dBa).  (33) 

Based  on  the  above  discussion,  we  see  that  in  the  bursty  traffic 
case,  the  information  state  used  for  sensing  and  access  deci¬ 
sion-making  consists  of  the  belief  vector  A,  the  residual  energy 
E(t)  =  e,  and  the  statistical  information  ^(t)  on  the  buffer 
state.  The  design  objective  is  thus  given  by 

A(l),i?(l)=e,^(l)  . 

(34) 

C.  Optimal  Solution 

We  derive  here  the  value  function  I2(A,e,  ^)  and  the  ac¬ 
tion-value  function  Qa(A,  e,  ^)  for  the  POMDP  given  in  (34). 
Following  Section  IV,  we  can  readily  obtain  the  maximum  ex¬ 
pected  reward  that  can  be  achieved  in  the  sleeping  mode  as 

Qo{A,  e,  =  y(r(A),  e  -  e^,  ^(^|0))  (35) 


a*(A,e,^) 

=  arg  max  Qa(A,e,  ^), 

aG{0,...,7V} 

=  arg  max  Qa{A,e,’^\k,d,^k,d)-  (39) 

^k,d^Q{e\a,k,d) 

We  point  out  that  the  structures  of  the  value  function  devel¬ 
oped  in  Section  IV-C  and  the  threshold  structure  of  the  optimal 
access  policy  developed  in  Proposition  6  hold  for  the  bursty 
traffic  case.  The  structural  results  for  the  optimal  sensing  policy 
(i.e..  Proposition  4  and  5  and  Corollary  1),  however,  do  not  hold 
since  the  optimal  operating  decision  in  the  bursty  traffic  case  is 
highly  dependent  on  the  user’ s  buffer  state.  For  example,  we  find 
that  in  the  bursty  traffic  case,  the  user  may  choose  to  sleep  even 
if  its  belief  vector  is  given  by  the  stationary  distribution  of  the 
underlying  SOS  (contrary  to  Proposition  4).  This  happens  when 
the  probability  that  the  user  has  packets  to  transmit  is  small.  To 
avoid  wasting  sensing  energy,  the  user  and  its  receiver  should 
wait  until  the  buffer  is  more  likely  to  be  nonempty. 

D.  Numerical  Study:  Optimal  Operating  Decision  for 
Empty  Buffer 

It  is  interesting  to  note  that  even  if  the  buffer  is  empty,  the 
user  may  want  to  sense  a  channel  in  order  to  gain  information 


{tt*  ,  tt*  }  =  arg  max  E 


t=i 
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Fig.  4.  Optimal  thresholds  /yth  on  the  SOS  correlation.  e=  30  (initial  energy), 
Bi  —  B2  —  1  (bandwidth),  1^(1)  =  [0.5,  0.5]  (initial  belief  vector),  = 
0.1  (sleeping  energy),  £^tx  C  {1,2}  (transmission  energy),  [pi{l),pi{2)]  — 
[0.6,  0.4],  =  1,2,  (channel  fading  statistics). 


about  the  SOS  for  future  use,  especially  when  the  SOS  is  highly 
correlated  in  time.  We  study  below  the  optimal  operating  deci¬ 
sion  (sleeping  versus  sensing)  in  the  empty  buffer  case.^ 
Consider  two  coupled  channels  N  =  2  where  the  SOS  is  ei¬ 
ther  S(t)  =  [0, 1]  (i.e.,  only  channel  2  is  idle)  or  S(t)  =  [1, 0] 
(i.e.,  only  channel  1  is  idle).  We  assume  that  0]  |  [0, 1])  = 

P5([0, 1]  I  [1, 0])  =  uj,  i.e.,  the  channel  occupancy  state  changes 
with  probability  uj  in  each  slot.  In  this  case,  the  correlation  be¬ 
tween  the  SOS  in  two  successive  slots  can  be  characterized  by 
a  single  parameter  j]  =  1  —  2vj.  Extensive  numerical  results 
show  that  the  optimal  operating  decision  is  a  monotonically  in¬ 
creasing  with  the  SOS  time  correlation  \r]\.  Specifically,  given 
the  user’s  residual  energy  e,  there  exists  a  threshold  77th  C  [0, 1] 
such  that 


A  P  ®  (sensing),  if  \ri\  >  »?th 

^  ^  ^  =  0  (sleeping) ,  otherwise. 


(40) 


We  assume  a  Poisson  packet  arrival  process.  In  Fig.  4,  we 
plot  the  threshold  77th  on  the  SOS  correlation  as  a  function  of 
the  packet  arrival  rate  p  for  different  sensing  energy  consump¬ 
tion  Cs .  We  see  that  the  threshold  77th  decreases  with  the  packet 
arrival  rate  p.  Intuitively,  when  p  is  large,  there  is  a  high  prob¬ 
ability  that  packets  will  arrive  in  this  slot,  and  hence  the  user 
should  be  more  active  in  collecting  information  about  the  SOS 
for  better  channel  selection  in  the  next  slot.  As  the  packet  ar¬ 
rival  rate  p  keeps  increasing,  the  threshold  77th  approaches  zero, 
i.e.,  the  user  should  always  sense  a  channel.  This  observation 
demonstrates  Proposition  4  since  we  have  the  continuous  traffic 
case  when  p  is  infinite.  As  expected,  the  threshold  77th  also  in¬ 
creases  with  the  sensing  energy  consumption  .  As  sensing  cost 
Cs  increases,  the  user  with  an  empty  buffer  tends  to  operate  in 
the  sleeping  mode;  it  only  senses  a  channel  when  the  resulting 
sensing  outcome  can  provide  more  information  about  the  SOS, 
i.e.,  the  time  correlation  of  the  SOS  is  high. 

^Similar  observations  are  obtained  for  the  case  when  the  probability  4^  0  (f )  = 
Pr{Z9(7)  =  0}  of  empty  buffer  is  close  to  1. 


VII.  Conclusion  and  Discussions 

Within  the  POMDP  framework,  we  have  developed  optimal 
distributed  MAC  protocols  for  energy-constrained  OSA  under 
both  the  continuous  and  the  bursty  traffic  models.  To  study 
the  fundamental  design  tradeoffs,  we  have  established  that  the 
optimal  sensing  and  access  policies  have  threshold  structures. 
We  have  also  provided  numerical  examples  to  study  the  impact 
of  different  factors  that  affect  the  optimal  decisions.  We  find 
that  the  residual  energy  has  more  significant  impact  on  the 
optimal  sensing  and  access  decisions  when  the  battery  is  close 
to  depletion  or  the  channel  occupancy  state  is  negatively  corre¬ 
lated  in  time.  When  the  sensing  cost  is  high,  the  secondary  user 
should  be  more  conservative  in  sensing  but  more  aggressive 
in  accessing  the  channel.  Interestingly,  we  also  find  that  even 
if  a  secondary  user  does  not  have  any  packet  to  send  in  the 
current  slot,  it  should  still  choose  to  sense  a  channel  when  the 
time-correlation  of  the  channel  occupancy  state  is  large.  These 
results  provide  not  only  insights  into  the  energy-constrained 
OSA  design  but  also  guidelines  for  suboptimal  designs. 

We  have  assumed  that  secondary  users  have  perfect  knowl¬ 
edge  of  the  statistical  model  of  the  spectrum  usage.  We  take  the 
viewpoint  that  such  statistical  models  of  a  particular  spectrum 
region  should  be  obtained  through  measurements  before  the  de¬ 
ployment  of  secondary  networks  in  that  spectrum  region.  This  is 
for  the  purpose  of  evaluating  the  potential  gain  or  profit  of  sec¬ 
ondary  market  in  that  spectrum  region.  Such  statistical  models 
can  then  be  made  available  to  secondary  users  to  facilitate  de¬ 
sign.  We  are,  however,  aware  that  in  some  scenarios,  secondary 
users  may  not  have  access  to  spectrum  usage  models.  In  this 
case,  we  have  a  POMDP  with  unknown  model,  and  existing  re¬ 
inforcement  learning  algorithms  may  be  borrowed  [21]. 

We  have  not  considered  sensing  errors  in  this  paper.  When  a 
secondary  user  may  mistake  a  busy  channel  as  an  idle  one  and 
vice  versa,  the  joint  design  of  the  access  strategy  and  the  oper¬ 
ating  characteristics  of  the  spectrum  sensor  is  crucial  in  order  to 
minimize  overlooked  spectrum  opportunities  without  violating 
the  interference  constraint.  This  issue  has  been  fully  addressed 
in  [5],  [24]  in  the  absence  of  energy  constraint.  The  impact  of 
sensing  errors  on  energy-constrained  OSA  design  is  one  of  the 
future  directions.  In  particular,  how  to  exploit  the  RTS-CTS  ex¬ 
change  to  combat  sensing  errors  and  to  ensure  synchronous  hop¬ 
ping  is  worth  investigating.  Another  interesting  extension  is  to 
consider  a  scenario  where  batteries  could  be  slowly  recharged. 

The  interaction  among  secondary  users  has  not  been  taken 
into  account.  The  sensing  and  access  protocols  proposed  in 
this  paper  can  be  applied  to  a  network  of  secondary  users. 
Their  performance  is,  however,  suboptimal  in  terms  of  network 
throughput.  Preliminary  results  on  spectrum  sharing  among 
distributed  competing  secondary  users  have  been  obtained 
in  [22]  without  considering  energy  constraints.  We  hope  that 
the  proposed  optimal  single-user  energy-constrained  MAC 
protocols  provide  insights  for  the  design  of  multiuser  OSA  with 
energy  constraint. 


Appendix  A 
Proof  of  Proposition  1 

As  explained  in  Section  IV-D,  given  any  initial  energy  e  and 

o 

any  initial  belief  vector  A,  the  secondary  user  can  only  experi¬ 
ence  a  finite  number  of  information  states  (A,  e)  during  its  entire 
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battery  lifetime.  Hence,  an  energy-constrained  OSA  problem 
can  be  viewed  as  a  MDP  with  a  finite  state  space  consisting 
of  all  possible  information  states.  Moreover,  the  immediate  re¬ 
ward  defined  in  (14)  is  nonnegative.  This,  together  with  the  in¬ 
evitable  termination,  makes  the  energy-constrained  OSA  design 
an  example  of  a  stochastic  shortest  path  problem.  Furthermore, 
the  strictly  positive  sleeping  energy  makes  the  state  transition 
of  the  resulting  stochastic  shortest  path  problem  acyclic  (i.e., 
loop-free). 

The  key  to  understanding  the  existence  of  stationary  optimal 
polices  is  to  note  that  the  residual  energy  of  the  secondary  user 
is  part  of  the  system  state.  Since  the  residual  energy  deter¬ 
mines  the  remaining  lifetime,  the  system  state  contains  all  the 
time-dependent  information  for  decision-making.  The  optimal 
actions  thus  depend  only  on  the  system  state  and  are  stationary 
in  time. 


Appendix  B 

Proof  of  Proposition  2 

Proof  of  P2.1:  Note  that  the  secondary  user  with  residual 
energy  can  always  act  as  if  it  has  a  lower  residual  energy.  Hence, 
the  secondary  user  with  a  larger  initial  energy  earns  no  fewer 
rewards.  ■ 

Proof  of  P2. 2:  We  prove  P2.2  by  induction  over  residual 
energies  e.  Specifically,  for  the  lowest  possible  residual  energy 
e  =  min^’,  the  value  function  of  any  information  state  is 
y(ri,e)  =  0  and  hence  P2.2  holds.  Suppose  that  it  holds 
for  all  possible  residual  energies  e!  ^  E  lower  than  e.  Since 
>  ft'  implies  T(ri|0)  >  T(11'|0)  as  seen  from  (22), 
we  obtain  from  (25a)  that  >  Qo{ft\e).  Next,  we 

show  that  Qai^^e)  >  Qa{^'^e)  for  H  >  H'.  We  note  that 
when  ft  >  ft\  we  have  T(ft\a^k)  >  T(ft'\a^k)  from  (23) 
and  hence  Qa{ft,e\k,^k)  >  Qa{^' ,e\k,^k)  from  (25c). 
Since  T(ft'\a^k)  >  T(r2'|a, 0)  as  seen  from  (23),  we  have 
Qa{ft\e\k,^k)  >  Qa(^',e|0,0)  =  V{f{fE\a,0),e  -  Cs) 
from  (25c).  Using  (25b),  we  then  obtain  that 


Appendix  C 

Proof  of  Proposition  3 


The  proof  of  Proposition  3  is  very  similar  to  that  provided  in 
[15]  for  a  POMDP  with  finite  and  fixed  time  horizon.  Hence,  we 
only  briefiy  describe  the  procedure  for  this  proof. 

For  any  residual  energy  e  <  +  £i,  we  have  V (A,  e)  =  0, 

which  can  be  written  as  an  inner  product  of  the  belief  vector  A 
and  an  all-zero  T -vector.  Suppose  that  Proposition  3  holds  for 
all  residual  energies  e'  e  E  that  are  lower  than  e.  After  some 
algebra,  we  can  rewrite  the  action- value  functions  given  in  (17) 
and  (20)  in  terms  of  the  T -vectors: 


<7o(A,e)=  max  (r(A|0),T) 


e— ep,s 


(41) 


Qa(A,e)=  max  EAsEc/a(fc|s) 

X  f-Ba'4>fc+  max  (T(A|a,A;),T) 


=  max 


^eQ(e\a)  f 
s' 


^Ua(kW) 


.k=0 


X  i?A+E^s(sis')E2:- 
V  sG*S 


l(A,a,k) 


(42) 


where  g  and  T ^  are,  respectively,  the  T -vec¬ 

tors  associated  with  the  regions  containing  belief  vectors 
T(A|0)  and  T(A|a, /c),  respectively.  Viewing  each  term  in 
the  square  brackets  of  (41)  and  (42)  as  an  element  Te,s  of  a 
possible  T-vector  Tg,  we  find  that  the  action- value  functions 
can  be  written  as  an  inner  product  of  the  belief  vector  and  an 
T-vector  Tg.  Moreover,  there  are  only  a  finite  number  of  such 
T-vectors  Tg  since  we  have  assumed  that  sets  Fg/  are  finite 
for  all  e'  <  e.  Since  the  maximum  of  a  finite  set  of  piecewise 
linear  and  convex  functions  is  also  piecewise  linear  and  convex. 
Proposition  3  holds. 


Qa(f^,e) 

>  (1  -  (^Ja)V{f(p!\a,Q),e-  e*) 

L 

+  uJa'y'Va{k)  max  Qa{^' ,e\k,^k) 

T>fcG^(e|a,fc) 

>U(r(17>,0),e-e,) 

L 


+ 


y^Pa(k)  max  Qairi'  ,e\k,^k) 

^  T>fcG^(e|a,fc) 


-y(r(n>,0),e-e,)J 
=  (l-a;')y(t(n'|a,0),e-e,) 

L 

+  ^a'fPa{k)  max  Qa{^l',e\k,^k)  =  Qa{^',e). 

^  T>fcG^(e|a,fc) 


Hence,  by  (24),  we  have  V (H,  e)  >  U {ft\  e),  which  completes 
the  proof.  ■ 


Appendix  D 

Proof  of  Propositions  4-5  and  Corollary  1 

o 

Proof  of  Proposition  4:  We  prove  by  induction  that  a*  (A 

o  o 

,e)  =  1,  i.e.,  U(A,e)  =  Qa(A,  e).  Clearly, 

V (A,  e)  =  Qa(A,  e)  =  0  holds  for  any  a  C  {1, . . . ,  N}  when 
e  =  min^.  Suppose  that  this  equality  holds  for  all  residual  en- 

o 

ergies  e'  e  E  lower  than  e.  Since  A  is  the  stationary  distribution 
of  the  underlying  SOS,  we  have  T (A|0)  =  A.  We  thus  obtain 
from  (16)  and  (17)  that 

U(A,e) 

o  o 

=  max{U(A,  e  —  Cp),  max  Qa(A,e)} 

o  o 

=  maxj  max  Qa(A.e  —  ej)).  max  Qa(A,  e)| 

o 

=  max  Qa(A,e)  (43) 

ae{lv,Af} 
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where  the  last  equality  is  due  to  the  monotonicity  of  the  value 
function  in  terms  of  the  residual  energy.  Proposition  4  thus 
follows.  ■ 

Proof  of  Proposition  5:  We  consider  the  single-channel 
case  (A^  =  1)  and  adopt  the  value  function  defined  in  (24)  and 
(25).  We  note  that  when  N  =  1,  the  belief  vector  A(f)  reduces 
to  a  scalar  uj{t)  as  defined  in  (21),  and  the  corresponding  belief 
update  T(a;(f)  |a,  k)  under  sensing  outcome  k  from  this  channel 
reduces  from  (23)  to  uj{t  -h  1)  =  /3  if  A:  >  0  and  uj{t  -h  1)  =  g 
if  k  =  0,  which  is  independent  of  the  current  belief  vector  u{t). 

Lemma  1:  Consider  the  single-channel  case  {N  =  1).  Given 
current  residual  energy  e,  we  have  that  for  any  belief  vector  uj, 
L 

G{e)  p{k)  max  Qi(a;,  ej/c,  ^/.)  >  12(a;,  e  —  e^). 

T>fcG^(e|a,fc) 

(44) 

Proof:  We  prove  this  lemma  by  induction.  For  any  residual 
energy  e  <  -hci ,  the  value  function  of  any  information  state  is 
V{Lt^e)  =  0  and  hence  (44)  holds.  Suppose  that  it  holds  for  all 
possible  residual  energies  e'  e  S  lower  than  e.  Then,  applying 
(25)  to  (24),  we  obtain  that 

V(uj,e-  Cp) 

=  msix{V (g  -h  (/?  —  a)uj^  e  —  2ep); 

(1  -  uj)V (g,  e  -  Cp  -  Cs)  +  uG{e  -  e^)} 

<  max{G(e  —  e^);  (1  —  uj)G{e  —  e^) 

-h  ujG{e  —  Cp)}  <  G{e  —  Cp)  <  G{e)  (45) 

where  the  last  two  inequalities  follow  from  the  fact  that  eg  > 
Cp  and  the  value  function  is  monotonically  increasing  with  the 
residual  energy.  This  completes  the  proof  of  (44).  ■ 

Suppose  that  the  optimal  sensing  action  is  a*  {uji ,  e)  =  1 
(i.e.,  sensing)  when  the  current  belief  vector  is  uji.  That  is, 
Qi(a;i,e)  >  Consider  any  belief  vector  uj2  such 

that  UJ2  >  .  We  obtain  from  (25b)  that 

Qi{u2,e)  =  {l-uj2)V{a,e-es)-\-uj2G{e) 

f  —  Ul 

_  1  -  W2  ,  UJ2  -  UJi 

—  :j - e)  +  — - G(e) 

i  —  LJi  i  —  LJi 

>  e)  +  (46) 

i  —  UJ\  i  —  UJ\ 

Applying  (25a)  and  (44)  to  (46),  we  obtain  that 

(2i(w2,e)  >  \ — —V (a  +  {(3  -  a)u}]_,e  -  Cp) 

I  —  UJ\ 

+  (47) 

i  —  (jJ\ 

Since  the  value  function  V{uj^e)  is  convex  in  belief  vector  a;, 
we  obtain  from  (47)  that 


Qi(c^2,e)  >  y  (  +  (/?  -  aVi] 


/3,e-ej 


1  —  LOl 

UJ2  - 

1  —  UJl 


=  V{a-\-  {[5  -  a)uj2,e  -  ^p) 

=  Qo(^2,e).  (48) 

Hence,  the  optimal  sensing  action  for  U2  is  a*(a;2,  e)  =  1.  That 
is,  a*  (a;,  e)  is  monotonically  increasing  in  a). 

We  see  from  (22)  and  (23)  that  the  belief  vector 
uj  G  [min{G, /?}, max{G, /?}].  Furthermore,  by  Proposition 
4,  the  optimal  sensing  action  is  given  by  a*  (a;,  e)  =  1  where 
a;=  (g)/(1  -h  g  —  /?)  is  the  stationary  distribution  of  the  SOS. 
Hence,  threshold  rth(e)  is  upper  bounded  by  (g) a  —  [3). 

■ 

Proof  of  Corollary  1:  When  a  =  have  a;  =  g  as  seen 
from  (22)  and  (23).  By  Proposition  5,  the  sensing  threshold  is 
given  by  rth(e)  =  a,  and  hence  a*(a;,  e)  =  1.  ■ 


Appendix  E 

Proof  of  Proposition  6 

Consider  the  case  where  the  secondary  user  operates  in  the 
sensing  mode  and  observes  sensing  outcome  ©(f)  >  0.  Inspec¬ 
tion  of  (6)  and  (13)  reveals  that  the  belief  update  T(A|a,  k)  is 
independent  of  k  when  A;  >  0.  Hence,  Qa(A,e|A;,  0)  is  iden¬ 
tical  for  all  positive  k.  It  thus  suffices  to  show  that  Qa(A,  e|  A;,  1) 
is  monotonically  decreasing  with  k.  This  follows  straightfor¬ 
wardly  from  (19)  and  the  monotonicity  of  the  value  function 
with  the  residual  energy. 

Furthermore,  when  N  =  1,  the  updated  belief  vector 

T(A|a,  k)  is  determined  solely  by  the  current  observation.  The 
action- value  function  Qa(A,  e|  A;,  </>/.)  given  in  (19)  and,  hence, 
the  optimal  access  decision  are  thus  independent  of  the  current 
belief  vector. 
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